AD-A241  982 


G 


Kullback-Leibler  Information  for  Ordering  Genes  Using 
Sperm  Typing  and  Radiation  Hybrid  Mapping 


Herman  Chernoff 


Harvard  University 

Mathematical  Sciences  Research  Institute 


Technical  Report  No.  ONR-C-9 


October,  1991 


Reproduction  in  whole  or  in  part  is  permitted  for  any 
purpose  of  the  United  States  Government. 

This  document  has  been  approved  for  public  release  and 
sale,  its  distribution  is  unlimited. 


91-13912 

lllll'iis 


Kullback-Leibler  Information  for  Ordering  Genes  Using 
Sperm  Typing  and  Radiation  Hybrid  Mapping 

Herman  Chernoif 
Harvard  University 

Mathematical  Sciences  Research  Institute 
ABSTRACT 

Two  technologies  applicable  to  gene  mapping  are  those  of  sperm  typing  and 
radiation  hybrid  mapping.  Sperm  typing  makes  use  of  the  polymerase  chain 
reaction,  a  biochemical  technique  which  alows  enormous  amplification  (pro¬ 
duction  of  multiple  copies)  of  small,  selected  DNA  fragments  from  a  single 
chromosome.  A  sample  of  sperm  from  a  single  donor  is  analyzed  to  see 
which  alleles  (distinct  forms  of  the  various  genes)  are  present  in  the  indi¬ 
vidual  sperms.  The  frequencies  with  which  the  various  possibilities  occur 
can  be  used  to  supply  estimates  of  the  ordering  and  of  the  recombination 
probabilities  among  the  genes  for  which  that  donor  is  heterozygous  (hav¬ 
ing  different  alleles  of  the  same  gene.)  Radiation  hybrid  mapping  employs 
a  different  technology  where  hybrid  mouse  cells  containing  a  human  chro¬ 
mosome  are  subjected  to  a  dose  of  radiation,  which  leads  to  breaking  the 
chromosome  into  segments,  a  fraction  of  which  are  retained  in  succeeding 
generations.  The  simultaneous  presence  or  absence  of  various  genes  provides 
indirect  information  on  how  close  together  these  genes  are,  and  also  on  the 
ordering  of  these  genes. 

For  each  of  these  methods,  the  analysis  grows  in  complexity  as  the  num¬ 
ber  of  genes  being  considered  increases.  At  the  same  time  the  accuracy  of 
the  probabiliistic  models  used  in  the  analysis  becomes  more  questionable. 
On  the  other  hand  the  ability  to  determine  the  order  of  three  genes  may  be 
enhanced  by  the  inclusion,  in  the  analysis,  of  the  data  on  nearby  genes.  For 
both  of  these  methods,  Kullback-Leibler  information  numbers  are  derived  to 
test  hypotheses  involving  the  order  of  m  genes.  These  information  numbers 
are  computed  for  testing  hypotheses  concerning  the  ordering  of  three  genes 
with  and  without  considering  the  presence  of  data  involving  other  nearby 
genes. 
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1  Introduction 

Two  technologies  applicable  to  gene  mapping  are  those  of  sperm  typ¬ 
ing  and  radiation  hybrid  mapping.  Sperm  typing  makes  use  of  the 
polymerase  chain  reaction,  a  biochemical  technique  which  alows  enor¬ 
mous  amplification  (production  of  multiple  copies)  of  small,  selected 
DNA  fragments  from  a  single  chromosome,  A  sample  of  sperm  from 
a  single  donor  is  analyzed  to  see  which  alleles  (distinct  forms  of  the 
various  genes)  are  present  in  the  individual  sperms.  The  frequencies 
with  which  the  various  possibilities  occur  can  be  used  to  supply  esti¬ 
mates  of  the  ordering  and  of  the  recombination  probabilities  among 
the  genes  for  which  that  donor  is  heterozygous  (having  different  alle¬ 
les  of  the  same  gene.)  Radiation  hybrid  mapping  employs  a  different 
technology  where  hybrid  mouse  cells  containing  a  human  chromosome 
are  subjected  to  a  dose  of  radiation,  which  leads  to  breaking  the  chro¬ 
mosome  into  segments,  a  fraction  of  which  are  retained  in  succeeding 
generations.  The  simultaneous  presence  or  absence  of  various  genes 
provides  indirect  information  on  how  close  together  these  genes  are, 
and  also  on  the  ordering  of  these  genes. 

For  each  of  these  methods,  the  analysis  grows  in  complexity  as  the 
number  of  genes  being  considered  increases.  At  the  same  time  the  ac¬ 
curacy  of  the  probabiliistic  models  used  in  the  analysis  becomes  more 
questionable.  On  the  other  hand  the  ability  to  determine  the  order 
of  three  genes  may  be  enhanced  by  the  inclusion,  in  the  analysis,  of 
the  data  on  nearby  genes.  For  both  of  these  methods,  we  shall  exam¬ 
ine  the  relevant  Kullback-Leibler  information  numbers  for  hypotheses 
concerning  the  ordering  of  three  genes  with  and  without  considering 
the  presence  of  data  involving  other  nearby  genes. 

In  Section  2  we  introduce  the  model  for  sperm  typing  and  discuss 
the  maximum  likelihood  estimates  of  the  recombination  probabilities. 
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In  Section  3  we  derive  expressions  for  the  relevant  Kullback-Leibler 
informations  for  sperm  typing.  In  Section  4  we  describe  the  model  for 
radiation  hybrids  and  derive  the  corresponding  information  numbers. 
The  outcome  of  the  calculations  is  described  in  Section  5.  We  termi¬ 
nate  this  introduction  Witiii  d  UiiC  f  discussion  of  the  Kullback-Leibler 
(KL)  information. 

Given  two  simple  hypotheses  concerning  the  (density)  distribution 
f{x)  of  the  data  X,  Ho  :  f(x)  =  foix)  and  Hi  :  f{x)  =  fi(x)  the  KL 
information  for  discriminating  between  Hq  and  Hi  is 

KUoJv)  =  E,,{log\MX)/MX)]}-  (1) 

The  subscript  /o  refers  to  the  fact  that  the  expectation  is  calculated 
for  the  case  where  the  distribution  of  X  is  governed  by  /q.  The  in¬ 
formation  K  measures  the  exponential  rate  at  which  the  posterior 
probability  of  Hi  approaches  zero  when  Hq  is  true,  as  independent 
observations  on  X  are  obtained.  It  is  particuleirly  relevant  in  the  de¬ 
sign  of  sequential  experiments,  such  as  were  discussed  by  Goradia  and 
Lange  (1990).  Suppose  now  that  under  our  model  the  density  of  X 
can  be  described  in  terms  of  a  parameter  0,  i.e.  /(x)  =  f{x,6),  and 
the  underlying  probability  distribution  is  governed  by  I?  =  Oq,  and 
we  are  interested  in  a  composite  alternative  ffj  :  0  G  fli  to  the  true 
hypothesis  Hq  :  6  =  Oq.  Then  the  appropriate  measure  is 

K(Ho,H,)  =  inf  E,Jlog[f(XJo)/f(XJ,)l)  (2) 

"i€Ui 

which  can  be  decomposed  into  the  following  difference  if  either  term 
is  finite 

K(Ho,Hi)  =  EfAloglf(X,0o)}  -  sup  £?,„{log /(X, <?i)}. 

We  shall  suppress  the  subscript  0o  when  there  is  no  danger  of  ambi¬ 
guity. 

2  The  sperm  typing  model  and  maximum  likeli¬ 
hood 

Consider  first  the  case  of  three  genes  for  which  the  donor  is  heterozy¬ 
gous,  and  his  two  chromosomes  have  genes  ABC  and  abc  respectively. 
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A  sperm  will  have  a  chromosome  providing  one  of  the  8  following  ob¬ 
servations,  ABC,ABc,AbC,Abc,aBC,aBc,abC,abc  with  probabil¬ 
ities  depending  on  the  recombination  probabilities  and  the  order¬ 
ing  of  the  three  genes  on  the  chromosome.  Suppose  that  the  genes 
appeared  in  the  order  ABC  rather  than  ACB  or  BAC.  Suppose 
also  that  the  recombination  probabilities  (indicating  the  probabilities 
that  in  the  reproduction  process,  the  chromosomes  would  seperate 
and  recombine)  between  A  and  B  is  and  between  B  and  C  is 
Finally  suppose  that  the  recombination  events  are  independent. 
Then  the  probabilities  associated  with  ABC,  abc,  and  AbC,  would  be 
(1  -  -  <f>bc)/2,  (1  -  -  <f>bc)/%  and  <i>ab4>bcl2  respectively. 

The  probabilities  associated  with  the  other  5  events  can  be  calculated 
similairly. 

While  the  estimation  of  <f>ah  and  axe  of  interest  and  relevant, 
our  main  focus  in  the  next  section  will  be  on  deciding  which  is  the 
correct  one  of  the  three  possible  orderings  ABC,  ACB,  BAC.  Note 
that  without  reference  to  other  parts  of  the  chromosome  the  orderings 
ABC  and  CBA  are  equivalent  and  we  need  consider  only  three,  or 
half  of  the  six  possible  permutations  of  ABC.  It  is  also  evident  that 
the  relevant  information  in  the  observed  categories  ABC  and  abc  are 
equivalent,  and  thus  we  may  combine  these  two  observations  into  one 
equivalent  one,  ABC  with  probability  (1  —  —  4>be)  under  the 

ordering  ABC,  and  probability  (1  —  ^oc){l  -  O  under  the  ordering 
ACB,  and  probability  (1  -  ^^){1  -  0**)  under  the  ordering  BAC. 
Thus  we  need  only  consider  4  possible  observations,  e.g.  ABC,  ABc, 
AbC,  and  Abc,  each  representing  a  pair  of  the  original  8  categories. 

In  our  analysis  it  would  seem  important  to  bear  in  mind  that  the 
statistician  does  not  know  which  alleles  appear  on  the  original  chro¬ 
mosomes.  Thus,  even  with  the  order  ABC,  it  might  be  that  the  orig¬ 
inal  chromosomes  of  the  donor  have  AbC  and  aBc.  For  our  problem 
involving  relatively  small  recombination  probabilities,  the  data  would 
quickly  and  easily  determine  the  form  of  the  chromosome,  for  an  orig¬ 
inal  chromosome  with  ABC  would  lead  to  a  great  preponderance  of 
the  ABC  observations  independent  of  the  order.  Nevertheless  it  turns 
out  that  symmetry  aspects  of  the  analysis  make  it  unimportant  to  hy¬ 
pothesize  or  estimate  which  alleles  appear  on  each  chromosome. 

Goradia  and  Lange  (1990)  analyze  two  sequential  methods  of  se¬ 
lecting  the  correct  order.  They  do  not  analyze  the  sequential  proba- 
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bility  ratio  method,  since  the  two  approaches  that  they  use  are  much 
easier  for  them  to  analyze.  One  may  wonder  whether  there  is  a  sub¬ 
stantial  loss  of  efficiency  in  using  their  methods.  The  related  question 
that  we  address  is  whether  there  woiild  be  an  increase  in  the  efficiency 
of  deciding  the  order  of  ABC  if  the  analysis  were  extended  to  include 
4  or  5  genes.  Several  complications  arise  in  the  use  of  KL  numbers  to 
address  this  question.  One  is  that  in  ordeiing  4  (or  5)  genes,  there  are 
12  =  4!/2  (or  60  =  5!/2)  possible  orderings  of  concern.  Another  issue 
is  that  it  is  more  difficult  to  find  the  donor  who  is  heterozygous  on 
four,  rather  than  three,  specified  genes.  Finally,  technical  problems 
in  the  technology  may  make  the  simple  extension  of  the  above  prob¬ 
ability  model  less  re’iable  in  the  application  to  four  or  more  genes. 

In  any  case,  when  the  KL  numbers  indicate  that  there  is  little  to 
be  gained  by  introducing  4  or  5  genes,  then  it  makes  sense  to  confine 
attention  to  three  at  a  time.  In  case  there  is  a  potential  gain  of  a  great 
amount  of  information,  then  one  ought  to  consider  the  relative  merit 
of  doing  the  possibly  more  complicated  analysis  required  to  deal  with 
more  than  3  genes. 

Assuming  the  order  ABC,  the  likelihood,  based  on  »«Atc  > 

and  nx4e  observations  ABC.ABc,  AbC,Abc  respectively,  is 


where 

^AB  —  ^ABC  +  ^ABe  =  n  —  Tlx* 
n^C  =  +  ^Abe  =  n  —  Ubc 


and 


n  =  nxBC  +  «ABc  +  ^AhC  + 


is  the  total  number  of  observations.  The  corresponding  maximum 
likelihood  estimates  are 

<^o4  =  tlAhln 


and 


<^4c  = 


4 


yielding  the  likelihood 


HABC)  =  {ii-tl  - 


with  logarithm 

log  L{ABC)  =  -n{V{M  +  (3) 

where,  for  0  <  x  <  1, 

V'(x)  = -{xlogx+ (1  -  x)log(l  -  x)}  (4) 


is  an  entropy. 

The  likelihood  corresponds  to  that  calculated  from  observing  the 
two  sets  of  independent  binomials  corresponding  to  the  recombina¬ 
tions  from  A  to  B  and  from  B  to  C.  Notice  that  if  the  original 
chromosomes  had  AbC  and  aBc,  the  estimates  of  and  would 
be  replaced  by  the  complements  1  —  ^ab  and  1  —  <^jc  and  log  L  would 
be  unaltered. 

These  results  help  to  understand  the  derivation  of  the  KL  numbers 
in  the  following  section  where  we  deal  with  expected  log  likelihoods 
for  n  =  1. 

In  generalizing  to  m  genes,  we  could  extend  the  alphabetic  nota¬ 
tion,  but  it  seems  more  convenient  to  change  the  notation  slightly. 
We  label  the  genes  1  to  m  and  consider  those  permutations,  tt  = 
(tti,  ttj,  . . .  ,T^rn)i  for  which  1  appears  in  the  first  half,  or,  in  the  case 
where  m  is  odd,  may  appear  in  the  center,  but  2  appears  in  the  first 
half.  Thus,  for  m  =  3,  we  have  the  permutations  (123,132,213)  rep¬ 
resenting  the  3  possible  orderings. 

A  possible  parametric  point  6  is  described  by  a  permutation  tt  and 
a  vector  <t>  with  components  for  1  <  t  <  m  —  1,  representing 

recombination  probabilities.  For  the  time  being  this  notation  seems 
mildly  ambiguous  since  <^12  associated  with  7r°  =  (1,2, 3, 4, 5)  and  ^12 
associated  with  —  (1,2, 5, 4, 3)  should  be  designated  separately, 
possibly  with  superscripts.  Our  observations  will  consist  of  n  inde¬ 
pendent  vectors  of  the  form  X  =  (Xi,  A’2, . . . ,  where  the  i-th 
component  of  X  is  zero  or  one  depending  on  which  allele  of  the  i-th 
gene  is  observed  in  the  given  sperm  observation. 
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Supposing  that  the  true  ordering  is  n°,  and  given  the  associated 
values  of  <i>=  <f>23,  .  •  . ,  the  likelihood  is  easily  seen  to  be 

i  =  (1  -  (5) 

i=l 

where  n,,  is  the  number  of  times  that  X,-  ^  Xj  in  the  sample  of  n 
observations.  Then  the  maximum  likelihood  estimates  are 

^i,i+i  =  n,,,+i/n,  1  <  i  <  m  -  1  (6) 

and  the  maximum  likelihood  under  the  ordering  satisfies 

log  L{ir°)  =  ~'^V  (7) 

«=i 

Given  an  alternate  permutation  jt,  it  is  clear  that  the  correspond¬ 
ing  MLE  of  the  related  recombination  probabilities  are  given  by 

1  <  I  <  m  -  1  (8) 

and  the  maiximum  likelihood  satisfies 

=  (9) 

t=l 

Note  that  if  tt  =  (1,2, 5, 4, 3),  the  MLE  of  under  the  order¬ 
ing  TT  is  exactly  the  same  as  that  of  0i2  under  7r°.  Also,  under  the 
hypothesis  Hq  :  0  =  6^  =  (7r°,<^)  where  <i>  =  <t>23, ...  is 

specified,  the  variables  n,y  are  binomial  random  variables  associated 
with  probabilities 

<t>i}  =  P{Xi  ^  1  <  tiJ  <  ^  (10) 

Here,  and  later,  we  assume  that  Hq  applies  and  suppress  the  subscript 
00  for  P  amd  E.  Then  <f>ij  is  the  probability  of  an  odd  number  of 
recombinations  between  the  i-th  and  j-th  genes.  Thus  =  0,  = 

4>i+i,i,  and,  for  1  <  »  <  j  <  m  —  1, 

—  ^j+l,i  ~  ^«y(l  ~  ^i.y+i)  "i"  (1  ~  (ll) 
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3  Kullback-Leibler  information  for  sperm  typing 

To  calculate  the  KL  numbers,  consider  the  case  n  =  1.  Then  = 

4>ij, 

m-1 

E  log  /(X,  l?o)  =  E'^  log  <^,.,+1  +  (1  -  n,.,+i)  log(l  - 

«=i 

«=i 

For  specified  d  = 

E\otf(X,e)  =  BE  lo8(«„,„) 

+  (1  -  n,,,,-,..)  log(l  - 

m-1 

+  108(1 

which  is  maximized  with  respect  to  (j>'  by 

Thus  if  Hi  co’^responds  to  the  composite  hypothesis  of  the  ordering 
TT,  we  would  have 

-  '"(*,.+1)1  (12) 

»=1 

In  particular, suppose  that  we  are  dealing  with  3  genes  and  Hi 
corresponds  to  the  order  (1,3,2).  Then 

K{Ho,Hi)  =  V  ((^is)  +  V  (<^23)  —  V  (^12)  —  V  (^2s)  =  V  (^13)  —  V  (<^12) 
whereas  for  H2  corresponding  to  (2, 1,3), 

K{Ho,Hi)=V{<t>i,)^V{M. 
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Finally 


K{Ho,HiUHi)=V{4>,i)-mBx{V{<i,i2),V{4>ii)]  (13) 

More  generally  for  m  genes,  let  Hq  correspond  to  ttq  =  {1,2, ...  ,m) 
and  4>  =  :  1  <  t  <  m  —  l},  and  let  >1  be  a  subset  of  the  m!/2  —  1 

other  permutations  corresponding  to  alternate  orders.  Then 

K{Ho,Ht)  =  min{’£  KW,.,.,,)}  -  Z  VlA.iti)  (14) 

t=l  i=l 

We  are  mainly  concerned  with  3  cases.  Given  genes  1  to  5  with 
{<^1,1+1  :  1  <  t  <  4},  we  have 

case  1:  m  =  3,  <f>  =  (<^23)<^34)»  is  the  set  of  2  orderings  of  (1,2,3) 
other  than  (1, 2, 3) 

case  2:  m  =  4,  4>  =  {<i>iz,4>z\,<i>A%),  ^  is  the  set  of  8  orderings 
of  (1,2, 3, 4)  inconsistent  with  the  ordering  (1,2,3)  or  its  equivalent 
(3,2,1) 

case  3:  m  =  5,  <f)  =  <^34,  ^45),  A  is  the  set  of  40  orderings 

inconsistent  with  the  ordering  (2,3,4)  or  (4,3,2). 

These  three  cases  give  us  the  relevant  KL  numbers  for  the  ordering 
of  genes  2,  3,  and  4  when  considering  data  involving  (1)  the  three 
genes  (2,3,4),  (2)  the  four  genes  (2, 3,4,5),  and  (3)  the  five  genes 
(1,2, 3,4,5). 

4  Radiation  hybrid  model 

Another  technology  for  estimating  distances  along  the  chromosome 
and  for  ordering  genes  is  that  of  radiation  hybrid  mapping.  Here  again 
we  introduce  the  model  via  cases  involving  few  genes.  This  model  was 
analyzed  by  Boehnke  et  al.  (1991)  and  Lange  and  Boehnke  (1991). 
The  technology  consists  of  radiating  a  hybrid  mouse  cell  which  carries 
a  human  chromosome,  thereby  breaking  the  chromosome  into  several 
fragments,  a  proportion  r  =  1  —  f  of  which  are  retained.  The  higher 
the  rate  of  radiation.  A,  the  more  fragments  are  made. 

We  will  assume  that  r  is  known,  that  the  distance  between  two 
genes  A  and  B  is  6,  unknown.  Then  the  probability  that  the  two 
genes  will  be  on  separate  fragments  is 

=  1  —  exp(— A^)  (15) 
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assuming  that  breaks  occur  like  a  Poisson  process  with  rate  A.  The 
probabilities  of  observing,  among  the  retained  fragments,  both  A  and 
B,  A  alone,  B  alone,  and  neither  are 


Pn  =  r{l-r(f>) 
Pio  =  rf4> 

Poi  =  Pio 
Poo  =  -  r(^) 


(16) 


respectively.  Here  we  have  assumed  that  breaks  and  retention  events 
are  independent,  and  that  r  is  constant. 

It  is  relatively  easy  to  calculate  the  Fisher  Information  for  esti¬ 
mating  <f>.  That  is 


J  =  E 


5(log  likelihood) 


d(f> 


rr 


(2  -  d>) 


(i>{l  -  r(/>)(l  -  f4>) 


(17) 


6  is 


It  follows  that  the  Fisher  information  with  respect  to  the  distance 


r* 

J 


=  j 


rf\\l  -  <i>Y){2  -  <i>) 


(18) 


^(1  -  rd))(l  -  f4>) 

For  small  A^,  4>  X6  and  J*  w  2rfXI6.  Insofar  as  lj{nJ*6^)  is 
the  asymptotic  relative  variance  of  the  large  sample  estimate  of  6,  it 
gives  us  a  clue  about  what  values  of  A  would  be  useful  for  ordering 
the  genes.  Uncertainty  in  the  knowledge  of  r  complicates  matters 
somewhat.  In  that  case  the  information  matrix  for  6  and  r  should  be 
evaluated  and  inverted. 

To  proceed  with  the  ordering  problem,  suppose  that  the  genes  are 
arranged  in  order  7r°  =  (l,2,...,m)  and  that  the  distances  between 
successive  genes  5i,i+i,  give  rise  to  separation  probabilities  <p,,,+i.  Then 
let  the  observation  be  a  vector  X  =  (Xi,  Xj, . . . ,  X„,)  to  indicate 
which  genes  are  retained.  That  is  X,  =  1  indicates  retention  of  the 
i-th  gene  and  otherwise  Xj  =  0.  Then  X  is  a  Markov  Process  where 


m—l 


fx{x)  =  **)  JJ  9{ii,Xi+i;<f>i,i+i) 


(19) 


.=1 
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and 


5(1, 1;’-)  =  1  -  rr 

5(1,0;  r)  =  fT  (20) 

5(0, l;r)  =  rr 
5(0, 0;r)  =  l-rr. 

Further,  for  i  <  j, 


P{Xj  =  Xj\Xi  ==  1.)  ^  g{xi,Xj\4),;) 

(21) 

where 

<Pii  =  1  -  exp(~A6,^)  = 

(22) 

and 

i-i 

^ij  =  ^k.k  +  l 

ic=:t 

(23) 

is  the  distance  between  the  i-th  and  j-th  genes. 
We  shall  be  interested  in  maximizing 


Wij{T)  =  E\ogg{Xi,Xj\T) 
with  respect  to  r.  Then 

Wi,{T)  =  r{l-f(f)ij)\og{l-fT)  +  rf(i>ijlog{fT)  +  fr(i>ij\og{rT) 

+  f(l  -  r<(>ij)  log(l  -  rr) 

and 

^  rr{<i>ij  -r){2-  r) 
r(l  -  rr)(l  —  fr) 

vanishes  only  at  r  =  <f)ij  in  the  interval  (0,1),  and  indeed,  u),;(r) 
attains  its  maximum  value 

^ (<^o)  =  ’■(1  “  log(rf(^^) 

+  f(l  -  r(i>ij)  log(l  -  r(i)ij)  (24) 

at  r  =  Incidentally,  this  result  could  also  be  derived  without 
calculating  the  derivative,  by  noting  th?  relationship  between  u;,y  and 
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a  Kullback-Leibler  number  and  that  a  KL  number  is  always  nonneg¬ 
ative,  and  hence  an  expression  of  the  form  E0^\\ogf{x,0)]  attains  its 
maximum  value  when  6  =  Oq. 

We  are  now  in  position  to  calculate  the  KL  information.  Let 
Ho  :  0  =  do  correspond  to  the  permutation  and 
<l>  =  {<f>12,<f>25,  •  •  •  J  and 

Hi  :  0  =  8i  correspond  to  the  permutation  tt*  and 

—  (^125  •  •  •  > 

Then,  with  represented  by  E,  we  have 


K{Ho,Hi)  =  Elogj{X,0o)-E\ogf{X,0i) 

r 

=  E\og 


-Elog 


•=1 

”n  six,,, 


•  =1 


m-1 


=  -Vir)  +  E  +  V{r)  -  £ 


1=1 


t=l 


which  is  minimized  with  respect  to  <^*  by  =  '^)r.ir.+,.  Thus  for 

Hi  corresponding  to  the  ordering  x, 


K(Hc,S,)  =  £«'(*,.«)  -  E  ..*.)■ 

1=1  1=1 

Further  when  A  is  an  arbitrary  set  of  permutations. 


K{Ho,Ht)  =  e'w'W.M+i)  -““EW'W 

1=1  i=l 


(25) 


Thus  we  can  evaluate  the  effect  of  considering  neighboring  genes  for 
the  case  of  radiation  hybrids  just  as  we  did  in  the  case  of  sperm  typing 
with  W  playing  the  role  of  —  V. 


5  Calculations 

The  Kullback-Leibler  numbers  for  ordering  the  three  genes  (2,3,4) 
using  sperm  typing  were  calculated  for  various  values  of  <f>,  yielding 
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53  =  53(<^23,<^34),54  =  54 (<^23,  <^34.  <^45)  and  55  =  56 ((^),  when  consid¬ 
ering  3, 4, and  5  genes  respectively. 

The  values  of  5s(a,a)  peak  at  53(0.12,0.12)  =  0.14,  but  this  peaJc 
is  rather  broad,  since  53(0,0)  is  0.103  at  o  =  0.04  and  0.099  at  o  = 
0.25.  The  function  53(0,6)  drops  rapidly  as  o  and  6  separate.  It 
seems  that  54(<^2s»  ^34>  <^45)  is  no  improvement  over  S3{<f>2i,  <f>u)  when 
4>23  <  ^34-  When  ^23  >  <^34i  there  is  room  for  substantial  improvement 
by  including  gene  5.  However,  in  those  cases,  including  gene  1  also, 
rarely  gives  additional  gain. 

If  <^23  =  ^3^  is  kept  fixed,  then  5$,  regarded  as  a  function  of  <i>i2 
and  <^45  is  constant  along  squares  for  which  the  diagonal  is  along 
<^12  =  <^45-  The  function  55(0,0,0,0)  attains  a  maximum  value  of 
0.231  at  o  =  0.10.  For  fixed  o,  55(6,0,0,6)  peaks  at  6  =  6(0)  where 
6(0)  «  1.4o.  This  value  in  turn  has  a  peak  of  0.258  at  o  =  0.10  and 
6  =  0.14. 

If  (f>23  is  substantially  larger  than  <^34,  then  56  (^)  =  54(<^23>  <^34 >  <^45)- 
If  ^23  is  not  much  larger  than  <^34,  the  introduction  of  gene  1  begins 
to  have  some  effect  if  <^45  is  rather  close  to  optimal  for  S4  and  4>i2 
is  neither  very  small  nor  very  large.  There  is  another  way  to  look 
at  this  phenomenon.  If  ^23  S'nd  ^34  <  (^23  S’l'e  kept  fixed,  then  5$  is 
constant  along  rectangles  in  the  (^12,^43)  space.  As  ^34  decreases, 
these  rectangles  become  elongated  along  the  ^12  direction,  and  some 
of  these  rectangles  degenerate  to  lines  for  small  and  large  values  of 
045.  When  034  decreases  enough,  all  the  rectangles  degenerate,  and 
the  level  lines  become  parallel  lines  and  S3  is  independent  of  0i2. 

In  summary,  if  023  «  034,  consideration  of  five  genes  is  required 
to  get  improvement  over  that  of  three  genes.  If  023  is  considerably 
different  than  034,  one  extra  gene  on  the  side  of  the  two  adjacent 
genes  can  give  improvement,  but  the  gene  on  the  other  side  will  not 
help.  Table  1  presents  the  results  for  some  cases  and  illustrates  these 
comments. 

The  qualitative  results  for  the  use  of  radiation  hybrid  mapping  are 
similar  to  those  for  sperm  typing.  Table  2  presents  some  results.  The 
KL  information  depends  on  X6  and  r.  In  our  table  we  take  A  =  1.0 
and  r  =  0.4,  and  we  present  the  KL  numbers  R3  =  R3{S23,Ss4),R4  — 
^4(^23,^341  ^45)  and  R3  =  Rb{S).  We  note  that  the  peak  of  Rs{a,a)  is 
0.144  at  a  =  0.28  while  values  of  a  at  0.10  and  0.70  give  0.109  and 
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0.096.  The  peak  value  of  Rs{a,a,a,a)  is  0.224  at  a  =  0.22,  while  the 
peak  value  of  Rs{b,  a,  a,  b)  is  0.250  at  a  =  0.23,  b  =  0.34. 

These  results  have  obvious  potential  application  in  selecting  ap¬ 
propriate  doses  of  radiation  to  increase  the  information  content.  Of 
course,  the  broad  peak  of  R^  indicates  that  KL  values  are  not  very 
sensitive  to  the  choice  of  A.  The  tables  indicate  that  there  are  circum¬ 
stances  where  considering  4  or  5  genes  may  double  the  information 
content,  but  also  suggest  that  often  there  is  little  to  gain  by  con¬ 
sidering  five  or  more  genes  simultaneously.  The  tables  can  easily  be 
supplemented,  since  the  calculations  of  the  KL  numbers  are  easily 
implemented. 
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Table  1.  KL  information  for  ordering  genes  (2, 3, 4) 
using  sperm  typing  considering  3,4,  and  5  genes 


4>\2 

^23 

^34 

^45 

5, 

54 

5b 

.01 

.01 

.01 

.01 

.041 

.041 

.077 

.02 

.02 

.02 

.02 

.067 

.067 

.122 

.04 

.04 

.04 

.04 

.103 

.103 

.180 

.10 

.10 

.10 

.10 

.146 

.146 

.231 

.14 

.14 

.14 

.14 

.147 

.147 

.217 

.20 

.20 

.20 

.20 

.127 

.127 

.169 

.25 

.25 

.25 

.25 

.099 

.099 

.123 

.14 

.10 

.10 

.10 

.146 

.146 

.258 

.20 

.14 

.20 

.14 

.147 

.147 

.239 

.10 

.01 

.01 

.01 

.041 

.041 

.059 

.10 

.02 

.02 

.01 

.067 

.067 

.096 

.10 

.02 

.02 

.10 

.067 

.067 

.101 

X 

.02 

.01 

.01 

.035 

.067 

.067 

X 

.04 

.02 

.02 

.055 

.101 

.101 

X 

.04 

.02 

.04 

.055 

.109 

.109 

X 

.04 

.02 

.10 

.055 

.088 

.088 

X 

.10 

.04 

.02 

.065 

.092 

.092 

X 

.10 

.04 

.04 

.065 

.117 

.117 

X 

.10 

.04 

.10 

.065 

.130 

.130 

X 

.10 

.08 

.04 

.121 

.162 

.162 

y 

.10 

.08 

.08 

.121 

.168 

.199 

z 

.10 

.08 

.10 

.121 

.168 

.227 

y 

.10 

.08 

.15 

.121 

.168 

.207 

X 

.10 

.08 

.25 

.121 

.161 

.161 

X  represents  any  value  in  the  interval  (0,  oo) 
y  represents  any  value  in  an  interval  containing  (0.04,0.25) 
z  represents  any  value  in  an  interval  containing  (0.08,0.20) 
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Table  2.  KL  information  for  ordering  genes  (2,3,4)  using 
radiation  hybrid  mapping  considering  3,4,  and  5  genes 


Ai2 

.01 

.02 

.10 

.20 

.30 

.50 

1.50 

.34 

.20 

.20 

.20 

X 

X 

X 

X 

X 

X 

X 

X 

y 

z 

w 

X 


.01 

.02 

.10 

.20 

.30 

.50 

1.50 

.23 

.04 

.10 

.10 

.10 

.20 

.20 

.20 

.40 

.40 

.40 

.20 

.20 

.20 

.20 

.20 


A34 

.01 

.02 

.10 

.20 

.30 

.50 

1.50 

.23 

.04 

.10 

,10 

.04 

.10 

.10 

.10 

.20 

.20 

.20 

.15 

.15 

.15 

.15 

.15 


A4S 

.01 

.02 

.10 

.20 

.30 

.50 

1.50 

.23 

.04 

.04 

.20 

.01 

.04 

.10 

.20 

.10 

.20 

.40 

.04 

.10 

.20 

.40 

.60 


Ri 

.023 

.040 

.109 

.139 

.144 

.125 

.024 

.143 

.064 

.109 

.109 

.048 

.079 

.079 

.079 

.084 

.084 

.084 

.111 

.111 

.111 

.111 

.111 


R4 

.023 

.040 

.109 

.139 

.144 

.125 

.024 

.143 

.064 

.109 

.109 

.059 

.105 

.139 

.158 

.114 

.137 

.168 

.134 

.161 

.161 

.161 

.153 


Rs 

.044 

.073 

.180 

.223 

.217 

.168 

.025 

.250 

.099 

.143 

.188 

.059 

.105 

.139 

.158 

.114 

.137 

.168 

.134 

.164 

.206 

.178 

.153 


X  represents  any  value  in  the  interval  (0, 00) 
y  represents  any  value  in  an  interval  containing  (0.01,1.50) 
z  represents  any  value  in  an  interval  containing  (0.10,0.64) 
w  represents  any  vlaue  in  an  interval  containing  (0.04, 1.10) 
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